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In this study, we combined the behavioral and objective approach in the field of empirical 
aesthetics. First, we studied the perception of beauty by investigating shifts in evaluation 
on perceived beauty of abstract artworks (Experiment 1 ). Because the participants showed 
heterogeneous individual preferences for the paintings, we divided them into seven 
clusters for the test. The experiment revealed a clear pattern of perceptual contrast. The 
perceived beauty of abstract paintings increased after exposure to paintings that were 
rated as less beautiful, and it decreased after exposure to paintings that were rated 
as more beautiful. Next, we searched for correlations of beauty ratings and perceptual 
contrast with statistical properties of abstract artworks (Experiment 2). The participants 
showed significant preferences for particular image properties. These preferences differed 
between the clusters of participants. Strikingly, next to color measures like hue, saturation, 
value and lightness, the recently described Pyramid of Histograms of Orientation 
Gradients (PHOG) self-similarity value seems to be a predictor for aesthetic appreciation of 
abstract artworks. We speculate that the shift in evaluation in Experiment 1 was, at least in 
part, based on low-level adaptation to some of the statistical image properties analyzed 
in Experiment 2. In conclusion, our findings demonstrate that the perception of beauty 
in abstract artworks is altered after exposure to beautiful or non-beautiful images and 
correlates with particular image properties, especially color measures and self-similarity. 

Keywords: experimental aesthetics, perceptual contrast, abstract art, beauty, digital image analysis, self-similarity, 
color 



INTRODUCTION 

The field of experimental aesthetics attracted renewed interest in 
recent years. Two main approaches have emerged in this field. In 
one approach, the physiological and behavioral reactions of per- 
sons who view artworks are investigated. For example, in imaging 
studies, brain regions that are differentially activated by aesthetic 
visual stimuli were identified. Some of these regions belong to 
the self-reflective and reward systems of the brain (O'Doherty 
et al., 2003; Cela-Conde et al., 2004; Kawabata and Zeki, 2004). 
Other regions have previously been associated with moral judg- 
ment (Zaidel and Nadal, 2011; Avram et al., 2013) or are part 
of the default mode network (Vessel et al., 2012). At the behav- 
ioral level, researchers ask how persons perceive artworks or other 
visually pleasing stimuli in psychological experiments. 

In the other approach, statistical features that characterize 
visually pleasing stimuli are analyzed by modern computational 
methods (reviewed in Hoenig, 2005; Graham and Redies, 2010). 
A pioneer in this field was Gustav Theodor Fechner. His hypoth- 
esis of the golden section, which he advanced in his influential 
book "Vorschule der Asthetik" (Fechner, 1876), was one of the 
first attempts to directly measure image properties that relate to 
the aesthetic quality of images. Although later studies did not con- 
firm any significant correlation between beauty ratings and par- 
ticular geometric measures (for example, see McManus, 1980), 
Fechner set the foundations for a new scientific approach, i.e., the 
systematic search for stimulus properties that are associated with 
beauty. Nowadays, image analysis is based on firmly established 



empirical methods rather than on vague intuitive grounds. For 
example, computer-assisted algorithms are used to extract image 
features that characterize aesthetic images (Datta, 2006; Graham 
and Field, 2007; Redies et al, 2007a, 2012; Amirshahi et al, 2012), 
to predict emotional responses to paintings (e.g., Yanulevskaya 
et al, 2012) or to categorize painting styles (Wallraven et al., 
2009). It is hoped that, in combination, the two approaches of 
experimental aesthetics will help us to understand what the spe- 
cific properties of aesthetic images are and how they elicit brain 
responses that correlate with aesthetic judgments by the observers 
(Redies, 2007). 

The appreciation of beauty in artworks depends to a large 
extent on cultural norms and previous exposure to art objects 
(Leder et al., 2004). Several studies suggest that people can differ 
considerably in their individual judgments (e.g., Jacobsen, 2004), 
but few studies of aesthetic responses have taken into account 
the individual differences between observers in their experimental 
design. Because of the large interindividual differences, a recent 
study by Vessel et al. (2012) used artworks as stimuli that were 
individually selected to correspond to each participant's strong 
personal preference. Interestingly, several studies showed that the 
personal aesthetic preferences of patients who suffer from demen- 
tia are relatively stable, even at late stages of the disease (Halpern 
et al, 2008; Graham et al, 2013; Halpern and O'Connor, 2013). 

In the present study, we focused on the aesthetic perception of 
abstract art. Because the definition of beauty is a highly disputed 
matter, especially with respect to abstract artworks, we left it to 
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the participants' assessment what they perceived as beautiful or 
not. To take into account individual aesthetic preferences, the 50 
participants in our study first evaluated the beauty of 150 high- 
resolution abstract paintings and were then clustered into seven 
groups, each comprising participants with a preference for similar 
paintings. 

Following this initial beauty evaluation, we carried out two 
experiments. In Experiment 1, we studied perceptual contrast 
with respect to beauty by adapting participants to cluster-specific 
subsets of their preferred and non-preferred paintings, respec- 
tively. Perceptual contrast is defined as a shift of the evaluation 
of a stimulus away from the evaluation of the preceding stimu- 
lus (e.g., see Baccus and Meister, 2004). Because individuals differ 
in their taste, we assumed that, by using individualized stimuli, 
contrast effects would be stronger than for images that were gen- 
erally preferred or disliked by all participants. In Experiment 2, we 
studied the preferred paintings with respect to color and higher- 
order statistical image properties that have previously been linked 
to aesthetic perception and correlated these properties with the 
individualized evaluation data. 

One possible explanation for perceptual contrast is visual 
adaptation. As Webster (2001) pointed out, adaptation processes 
are nothing exceptional and have been known for a long time 
(e.g., see Gibson and Radner, 1937). Adaptation is necessary 
because our environment is always changing and thus cannot 
be analyzed optimally by a visual system with fixed proper- 
ties (Webster, 2001). The first scientific studies on adaptation 
dealt with relatively simple image features, such as the color 
aftereffect or the tilt aftereffect. The discovery of long-lasting 
aftereffects lead vision scientists to realize that the physiologi- 
cal correlate of adaptation must be more than simple fatigue of 
neural mechanisms or inhibitory mechanisms, as it had been 
assumed before (for a review, see Thompson and Burr, 2009). 
Recent research targeted relatively elaborate stimulus features 
in adaptation studies. For example, human faces have become 
widely studied stimuli, because they are of exceptional inter- 
est for human behavior. To name just a few of the results in 
this field, researchers demonstrated adaptation to gender (Troje 
et al., 2006), age (Schweinberger et al., 2010), and attractiveness 
(Rhodes et al., 2003). 

However, the existence of a visual adaptation is not the only 
possibility to explain perceptual contrast. An altered evaluation 
of the stimulus after exposure to a differing stimulus might also 
be the result of a criterion shift. Criterion shifts can be described 
as changes of the central tendencies of the participants' individ- 
ual psychometric functions (Morgan et al., 2012). The shift in 
evaluation can therefore not be taken as evidence for genuinely 
perceptual biases. 

In a recent study, Hayn-Leichsenring et al. (2013) demon- 
strated an attractiveness aftereffect for face photographs and 
art portraits. In the case of art portraits, similar afteref- 
fects were detected for beauty. In their study, attractiveness 
was defined as the physical allurement of a face whereas 
beauty was considered as a more general property of images 
and referred to the pleasure derived from the composition 
of the image (or artwork). Following this definition, the 
authors found a strong correlation of the beauty ratings 



with the attractiveness ratings for art portraits, suggesting the 
possibility that participants may confound the two features 
easily. 

In order to confirm and extend these findings, the goal of 
Experiment 1 in the present study was to explore whether percep- 
tual contrast for the perception of beauty can be demonstrated 
even in the absence of semantic content that can potentially 
confound the assessment of beauty. The existence of an after- 
effect on beauty might possibly have an evolutionary advan- 
tage. The ability to adjust one's perception to the currently 
relevant range of beauty in the environment can be a criti- 
cal benefit, as it improves the differential appraisal of actual 
stimuli. There are indeed hints that a long-term adaptation 
to aesthetic features exists (Carbon, 2011). To our knowl- 
edge, however, short-term aftereffects on artworks have been 
barely investigated to date. The only publication on this topic 
was restricted to a single style of painting (Carbon et al., 
2007). 

In the present experiment, we studied contrast effects on 
abstract images that do not depict recognizable objects. Although 
we cannot exclude that participants projected some imaginary 
content into the paintings, the influence of individual preferences 
for depicted objects or scenes is minimized by using abstract 
paintings so that beauty (as a formal property of the paint- 
ings) can be rated relatively independent of a preference for the 
semantic content of the paintings. 

In Experiment 2, we focused on statistical image features that 
were previously analyzed in studies of aesthetic images. The image 
features were calculated for the abstract artworks that were rated 
by the participants in Experiment 1. The resulting values were 
correlated with beauty ratings in order to challenge their useful- 
ness for predicting individual aesthetic preferences. Additionally, 
we investigated the relation of the physical image properties to 
the adaptation and the clustering of the participants. In the fol- 
lowing sections, the image features studied will be introduced 
briefly. 

SELF-SIMILARITY 

The property of self-similarity implies that an object as a whole 
has an appearance similar to its parts. Closely related con- 
cepts are scale-invariance and fractality (Taylor et al., 2011). For 
example, subsets of aesthetic monochrome artworks possess a 
scale-invariant Fourier spectrum, which means that the rela- 
tive strength of coarse and fine structures changes little as one 
zooms in and out of the image (Graham and Field, 2007; Redies 
et al, 2007b; Taylor et al, 2011; Melmer et al., 2013). Amirshahi 
et al. (2012) studied self-similarity in images of artworks directly 
by using a modern computational approach, the Pyramid of 
Histograms of Orientation Gradients (PHOG) method (Bosch 
et al, 2007). 

COMPLEXITY 

Berlyne (1974) related complexity to factors such as the regular- 
ity of the pattern, the amount of elements that form the scene, 
their heterogeneity, or the irregularity of the forms. Berlyne's idea 
that a high aesthetic appeal goes along with an intermediate level 
of complexity is still considered valid today (for a review, see 
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Nadal, 2007), although the range of complexity values observed 
in artworks is rather wide (Redies et al., 2012). 

ANIS0TR0PY 

According to Koch et al. (2010), large subsets of Western artworks 
tend to possess a more isotropic Fourier spectrum than their cor- 
responding real-world models. Similar findings were obtained by 
Redies et al. (2012), who described that overall gradient strength 
is more uniformly distributed across orientations in large subsets 
of colored artworks of Western provenance, compared to other 
categories of images. The perceptual significance of this finding 
remains unclear at present. 

BIRKHOFF-UKE MEASURE 

According to Birkhoff (1933), aesthetic value depends on the ratio 
of order and complexity. Following this idea, we substituted order 
by self-similarity to obtain a Birkhoff-like measure, as described 
in Redies et al. (2012). 

ASPECT RATIO 

Modern research has shown no evidence that a certain aspect ratio 
might be preferred over others, but points out the need for mea- 
surements on different types of images to answer this question 
more comprehensively (McManus, 1980; Russell, 2000). In the 
present study, we therefore correlated the beauty ratings with the 
aspect ratios of the abstract paintings. 

COLOR MEASURES 

The influence of color on aesthetic appreciation has been empha- 
sized before, particularly in approaches that use computational 
methods for quantifying the aesthetic quality of photographs 
and paintings (Li and Chen, 2009; Marchesotti et al., 2011). 
Forsythe et al. (2011) stressed the importance of color as a 
medium that artists use to communicate with the observer. 
Yanulevskaya et al. (2012) investigated emotional response pat- 
terns to certain colors. They revealed a correlation of bright 
colors and smooth lines with positive emotions, as opposed to 
dark colors and chaotic texture that go along with negative emo- 
tions. Palmer and Schloss (2010) explained such correlations 
with their ecological valence theory, stating that color prefer- 
ences are associated with preferred objects of the same colors. 
For example, fresh fruits are mostly of bright color, while rot- 
ten food, and excrements are normally of a dark brownish color 
that is naturally averted by most people. To date, several differ- 
ent color measures have been applied to colored artworks and 
photographs (e.g., see Datta, 2006). In the present work, we cal- 
culated color measures in 3 different color spaces (RGB, HSV, and 
Lab) that have been used in aesthetic quality assessment of images 
previously. 

Our study thus combines the two approaches in experimen- 
tal aesthetics mentioned above (behavioral and computational). 
In Experiment 1, we investigate, at the behavioral level, how the 
aesthetic judgments change after the viewing of most beauti- 
ful and least beautiful images. In Experiment 2, we ask which 
objective statistical properties in the same set of aesthetic images 
correlate with the beauty judgments. In both experiments, indi- 
vidualized evaluations of beauty are explicitly incorporated into 
the experimental design. 



METHODS 

In Experiment 1, participants took part in two tests to investi- 
gate adaptation to images evaluated as most and least beautiful, 
respectively. In Experiment 1 A, each participant rated the abstract 
images according to his or her own personal concept of beauty. In 
Experiment IB, participants were exposed to paintings that they 
considered to be either of high or low beauty. Subsequently, par- 
ticipants rated some of the images used in Experiment 1A again. 
For these images, the initial ratings (Experiment 1A) and the rat- 
ings that were given in Experiment IB were compared to detect 
perceptual contrast. 

EXPERIMENT 1 A: BEAUTY RATING OF THE IMAGES 

Participants 

Fifty participants ( 1 9-44 years old, M = 22 . 7 years old, 1 3 males) 
attended this study. Most of them were students, in particular 
of medical sciences, but other fields of studies and professions 
were reported also. None of them had received professional 
training in the fine arts. All participants declared having nor- 
mal or corrected-to-normal visual acuity and gave their written 
informed consent after receiving an explanation of the proce- 
dures. The study design complied with the ethical guidelines 
of the Declaration of Helsinki and was approved by the ethics 
committee of Jena University Hospital. 

Stimuli 

One hundred-fifty images of abstract paintings or drawings were 
scanned from different art books. We chose only abstract art- 
works, which did not carry any clear semantic content and did not 
depict any recognizable objects. Abstract artworks were selected 
to minimize the influence of a preference for image content on the 
evaluation of the images. The artworks are listed in the Appendix 
and were from a variety of abstract painters of the 20th and 21st 
century and from different cultural backgrounds of the Western 
hemisphere. A maximum of six artworks was included from each 
artist in order to decrease the influence of any preference or aver- 
sion for a given painter on the results. An effort was made to select 
artworks from art books as randomly as possible, regardless of 
personal preference by the authors. 

Digitization of the images was carried out with a commercial 
color scanner (Perfection 3200 Photo, Seiko Epson Corporation, 
Nagano, Japan) in RGB color format. Care was taken that the 
images scanned were of high quality and did not contain obvious 
artifacts like paper folds or stains. Moreover, only pictures of a size 
that enabled high-quality scans were chosen. No image enhance- 
ment algorithms were applied. All pictures were reduced in size to 
1024 pixels on the longest side by isotropic bicubic interpolation 
for display on the screen, on which stimuli were presented at a size 
of 165 mm (10.5° of visual angle). 

Procedure 

Images of all artworks were shown separately and in a random 
sequence on a black screen (Color Edge CG241W LCD monitor, 
EIZO Europe, Germany). A chin rest assured a constant viewing 
distance of 90 cm. The participants were asked to rate the art- 
works on a scale from 1 (most beautiful) to 4 (least beautiful), 
which reflected the grading scheme in the German school system. 
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In the course of the trial, every participant had to evaluate each 
picture once. 

The experiment was performed using the MATLAB program 
(version R2008A). The schedule of Experiment 1A is depicted in 
Figure 1A. Prior to presenting each image, a question mark was 
displayed (500 ms), followed by the image itself (600 ms) and a 
period of 1900 ms, during which a black screen was displayed and 
the participants were asked to rate the beauty of the pictures by 
pressing one of four keys labeled "1" to "4." We used a relatively 
short time period of 600 ms (see also Hayn-Leichsenring et al., 
2013) because this study focuses on perceptual rather than on 
cognitive effects. Moreover, the relatively short presentation times 
decreased the likelihood that participants perceived spurious con- 
tent in the abstract images or projected imaginary content into 
them. After every 30 images, the participants were allowed to take 
a short break. 

EXPERIMENT 1B: PERCEPTUAL CONTRAST ON MOST AND LEAST 
BEAUTIFUL IMAGES 

Participants 

Forty- two participants (19-44 years old, M = 22.7 years old, 9 
males), who had attended Experiment 1A about 5 weeks before, 
took part in this trial. 

As the evaluation on beauty was quite heterogeneous among 
the participants, i.e., groups of participants showed a specific 
"taste" in their evaluation, we chose to perform the experiment 
(Experiment IB) with seven clusters. To create these clusters, 
the data of all participants were subclassified with the k-means- 
clustering method according to their individual ratings on beauty. 
Clustering allowed allocation of participants to subgroups that 
resembled each other in their individual preference. The clusters 
were created for two purposes. Firstly, we used them to carry out 
the experiment on perceptual aftereffects (Experiment IB) with 



paintings that were preferred or non-preferred by small groups of 
participants. We expected that effects would be larger if the adap- 
tors closely corresponded to the beauty preferences of each par- 
ticipant. Secondly, the formation of clusters allowed us to search 
for correlations between image properties and cluster-specific rat- 
ings and to ask whether individual patterns of beauty preferences 
were associated with particular image properties (Experiment 2). 
A statistical analysis of the sum of squares (within clusters), the 
Bayesian information criterion and Dunn's index did not provide 
any consistent indication of which number of clusters (between 
2 and 10) would be optimal. Therefore, we chose to carry out 
the experiment with seven clusters and a minimum number of 
4 participants in each cluster. A larger numbers of clusters would 
have resulted in exceedingly small numbers of participants (<4) 
in each cluster. 

The seven clusters used for Experiment IB consisted of ten 
participants (1 cluster), nine participants (1 cluster), seven par- 
ticipants (3 clusters), six participants (1 cluster), and four par- 
ticipants (1 cluster), respectively. For the statistical analysis, we 
divided participants into 3-7 clusters in order to investigate the 
stability of the clusters and their statistical properties as the 
number of clusters increases. 

Stimuli 

For the adaptation phase, we chose the 15 images that were rated 
to be the most beautiful and the least beautiful, respectively, 
within each cluster. Thus, every participant adapted on a set of 
artworks that came close to his individual assessment of what are 
the most and least beautiful images. Examples of the artworks 
with generally high and low ratings are shown in Figure 2. 

In the evaluation phase, we used the 60 images that 
had received an average rating for their particular cluster in 
Experiment 1A. Images of the evaluation set were not part of 



A B 




1900 ms 



FIGURE 1 | Schedule for Experiment 1A (A) and Experiment IB (B). In the second part of the experiment, an adaptation phase preceded the evaluation 
phase and the adaptation was reconditioned by presentation of two adaptor images before the display of each test stimulus. 
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FIGURE 2 | Examples of paintings that were rated as most beautiful aphoristischem Rot" by Ernst Wilhelm Nay [1954. (c) Succession Nay, VG 

(A-C) and least beautiful (D-F), respectively. The paintings are (A) Bild-Kunst, Bonn, 2013]. (D) "Black Square" by Kasimir Malewitsch (1923). 

"Movement in Space" by Michail Matjuschin (1917/18). (B) Abstract painting (E) "Joie de terre" by Jean Dubuffet (1959; Succession Dubuffet, VG 

by Gerhard Richter [1977. (c) Gerhard Richter Images, Koln, 2013]. (C) "Mit Bild-Kunst, Bonn, 2013); and (F) "Alter Klang" by Paul Klee (1925). 



the assortment used for adaptation. Size and manner of the 
presentation of the images during the adaptation phase were the 
same as in Experiment 1A. The pictures used for evaluation had a 
reduced size of 115 mm on the longest side (7.3° of visual angle, 
720 pixels on the screen). We resized the images to investigate 
adaptive coding that is not exclusively a property of early stages 
of visual processing (Clifford et al., 2007). 

Procedure 

As depicted in Figure IB, the experimental trial consisted of an 
adaptation phase followed by an evaluation phase. For adapta- 
tion, the participants were asked to look at the 15 images that were 
evaluated to be the most beautiful (and in a second experimental 
block the least beautiful) in their respective cluster. The images 
were shown consecutively and in a repeating manner. Images were 
shown for 3000 ms three times each, so that the entire phase lasted 
2min and 15 s. In the subsequent evaluation phase, the adap- 
tation was reconditioned before the presentation of each target 
image by showing two images that were randomly selected from 
the 15 images considered most beautiful (2 x 3000 ms; or least 
beautiful, respectively). After reconditioning, a question mark 
appeared (800 ms) followed by the target image (600 ms). Then, a 
black screen was presented for 1900 ms, during which the partic- 
ipants were asked to respond as described in Experiment 1A. In 
order to prevent a bias that is caused by the sequence of the adap- 
tors, half of the participants adapted first on most the beautiful 
images and half on the least beautiful ones. In both experimental 
blocks of Experiment IB (adaptation on most beautiful and least 
beautiful images), the participants evaluated all the 60 images, 
which had previously received an average rating (see above). 

EXPERIMENT 2: IMAGE ANALYSIS 

As mentioned for Experiment IB (section Participants), we 
obtained 3-7 clusters of the participants with respect to their 
evaluation results of the 150 color images of abstract paintings 



(Experiment 1A) using the k-means method. This corresponded 
to a total of 25 different clusters (i.e., cluster 1/3, 2/3. . . 7/7). 
The aim of Experiment 2 was to identify correlations between 
the beauty ratings and perceptual contrast effects, respectively, 
with a variety of statistical image properties that were previously 
associated with aesthetic stimuli (see Introduction). 

For the statistical analysis, all images were down-sampled to 
a uniform size of 100,000 pixels because some of the scanned 
images contained halftone dots that were visible at higher res- 
olutions. This artifact would have affected the calculation of 
self-similarity (section Self-similarity) in particular. 

Self-similarity, complexity, anisotropy, and the Birkhoff-like 
measure were calculated using the PHOG method, a computa- 
tional method that was originally developed for object classifi- 
cation in images (Bosch et al., 2007). This method was used to 
measure statistical properties of photographs and artworks before 
(Amirshahi et al., 2012; Redies et al., 2012). The analysis was car- 
ried out using MATLAB 2008A. We recently described the method 
in detail in the appendix to the open-access publication by Braun 
et al. (2013); see also (Redies and Grofi, 2013). 

In brief, the method is based on a pyramid approach: Firstly, 
the HOG feature (Dalai and Triggs, 2005) for the entire image 
is calculated at the ground level (level 0). The HOG feature rep- 
resents the histogram of the mean strength of the luminance 
gradients binned in 16 equally sized orientation bins that cover 
the full 360° of orientations in the image. In the second step, 
the image was divided into 4 rectangles of the same size and the 
HOG features were calculated again for each rectangle (level 1). 
Each of the 4 subimages was again divided into equal rectan- 
gles and the HOG features were calculated for the resulting 16 
subimages as well (level 2). We took this approach up to level 3. 
Within each HOG feature, the strengths of the binned gradi- 
ents were normalized. For the analysis of the color images, the 
images were converted to the Lab color space. For each pixel in 
the color image, the maximum gradient magnitude in the L, a 
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and b color channels was used for the HOG calculation (gradient 
image). 

Self-similarity 

For calculating self-similarity, we compared the HOG features of 
the subimages on the third level with the HOG features of the 
entire image on level 0 using the Histogram Intersection Kernel 
(Barla et al., 2002). The third level proved to deliver the most 
reliable results in previous work because the differences between 
the subimages are more significant than at the lower level and 
yet robust and reliable (Amirshahi et al, 2012; Redies and Grofi, 
2013). 

Complexity 

In the gradient image, the sum of the strengths of all oriented 
gradients, which correspond to edges or lines with different orien- 
tations in the image, was used as a measure of complexity (Redies 
et al, 2012). This measure is highly correlated (Braun et al., 
2013) with another measure of complexity, the fractal dimension 
(Mureika and Taylor, 2013). 

Anisotropy 

The standard deviation of the luminance gradient strengths in the 
16 orientation bins at level 3 served as a measure of anisotropy. 
This property describes in how far the distribution of the oriented 
gradient strengths across all orientations deviates from a uniform 
distribution. A value close to zero indicates a uniform distribution 
of the orientation gradients across orientations. 

Birkhoff-like measure 

A Birkhoff-like measure was defined as the quotient of self- 
similarity over complexity, as introduced by Redies et al. (2012). 

Aspect ratio 

This measure is the quotient of height and width of an image. 

Color measures (HSV, RGB, and Lab color channels) 

Color features were analyzed in three different color spaces (HSV, 
RGB, and Lab) with Matlab and ImageJ (Abramoff et al, 2004). 
The original RGB-coded images were converted into the HSV 
color space with the MATLAB program and into the Lab color 
space with the Photoshop program (Adobe, Mountainview, CA). 
Subsequently, the strength (average pixel value) of each color 
channel was calculated. 

RESULTS 

BEAUTY RATING OF THE IMAGES (EXPERIMENT 1 A) 

For each image, the mean beauty rating was calculated for all 
participants. The mean rating ranged from 1.92 (for the most 
positively rated, i.e., "most beautiful") to 3.6 (for the most nega- 
tively rated, i.e., "least beautiful") [Mean (M) = 2.88, SD = 0.37]. 
Examples of paintings that received generally high and low beauty 
ratings, respectively, are shown in Figure 2. The distribution of 
the ratings for all paintings is shown in Figure 3. The average rat- 
ing scores of individual participants ranged from 1.97 to 3.58. We 
used the results of Experiment 1 A to cluster the participants into 
7 subgroups (see section Participants). 




Rating 



FIGURE 3 | Results of the rating experiment (Experiment 1A). 

Frequency represents the number of images that received the respective 
average rating. 



PERCEPTUAL CONTRAST ON MOST AND LEAST BEAUTIFUL IMAGES 
(EXPERIMENT IB) 

In this experiment, we obtained two ratings from each participant 
for each image: one after exposure to most beautiful images and 
one after exposure to least beautiful images. The average rating 
was 2.84 (SD = 0.19) after exposure to beautiful images and 2.68 
(SD = 0.20) after exposure to least beautiful images, respectively. 
In all clusters (with the exception of cluster 5), images were rated 
as more beautiful after exposure to least beautiful stimuli, when 
compared to the ratings after exposure to most beautiful images. 
A paired f-test across images confirmed the significance of this 
perceptual contrast effect (R = 0.808; df = 131; t = -10.468; 
p < 0.001). Results for all seven clusters are shown in 
Figure 4. 

STATISTICAL IMAGE ANALYSIS (EXPERIMENT 2) 

Over all participants (without clustering), we found significant 
correlations of beauty ratings with the following color mea- 
sures: the hue channel (Spearman coefficient p = 0.182, p < 
0.05; Figure 5B), the saturation channel (p = -0.217, p < 0.01; 
Figure 5C), and the value channel (p = —0.277, p < 0.01; 
Figure 5D) in the HSV space; the green color channel in the 
RGB color space (Spearman p = —0.217, p < 0.01); and the 
luminance (L) channel (p = —0.206, p < 0.05; Figure 5E) 
and the yellow-over-blue channel (b channel) in the Lab color 
space (p = —0.224, p < 0.01; Figure 5G). The other statisti- 
cal measures, for example self-similarity (Figure 5A) and the 
red-over-green channel (a channel) in the Lab color space 
(Figure 5F), did not correlate with overall beauty ratings. We 
next analyzed the data for each cluster of participants separately. 
Results are listed in Table 1. The correlations between the most 
eminent statistical features and the beauty ratings are depicted in 
Figure 6 for each cluster. 

Although there was no correlation over all participants 
(Figure 5A), we found significant correlations (p < 0.05 or 
lower) for self-similarity with beauty ratings in 10 of 25 clus- 
ters. Correlations were in both directions. For example, in the 
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analysis with 6 clusters (Figure 6D), we found 4 clusters with 
a significant correlation, of which 2 were positive and 2 neg- 
ative (p = 0.413, p < 0.01; p = 0.179, p < 0.05; p = -0.251, 
p < 0.01 and p = -0.188, p < 0.05; Table 1). Of the other 6 
significant correlations of self-similarity with beauty ratings 
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FIGURE 4 | Average ratings on beauty of abstract images after 
adaptation to most beautiful and least beautiful stimuli in Experiment 
1 for each cluster (numbered 1-7). Error bars indicate standard deviation. 
All differences between adaptations to most beautiful and least beautiful 
stimuli were significant (p < 0.01 ). The numbers at the bottom of the 
columns indicate the number of participants in each cluster. 



(Figures 6A-C,E), 5 were positive and 1 negative. Note that a pos- 
itive value implies that more self-similar images were rated as less 
beautiful. 

Other correlations with beauty ratings were found for color 
measures (Table 1). In the HSV color space, hue showed corre- 
lations in 6 clusters, and both saturation and value in 13 of 25 
clusters (Figure 6). The correlations of color saturation and color 
value with the beauty rating were nearly all positive with one 
exception in one cluster, in which a low saturation was preferred. 
In the RGB color space, we found highly significant correlations 
with red (13 clusters), green (13 clusters), and blue (7 clusters). 
Figure 6 shows the results from the Lab color space only because 
the results can be interpreted more easily in terms of preferences 
for specific colors. In the Lab color space, there were preferences 
for lightness (12 clusters), as well as for red over green (5 clus- 
ters), for green over red (1 cluster), and for yellow over blue (10 
clusters). 

According to these correlations, we can characterize the clus- 
ters as follows: Members of cluster 1/3, 1/4, 2/5, 3/6, and 6/7 
(shadowed by blue color in Figure 6) tend to prefer bright images 
of a comparatively low self-similarity and a high color value. In 
addition, significant preference for highly saturated color and for 
green color shade over red ones can be found in cluster 3/6. In 
the group of clusters shadowed in by yellow in Figure 6 (clusters 
2/3, 3/4, 1/5, 4/6, and 1/7), low self-similarity is a crucial factor for 
only 2 out of the 5 clusters. Furthermore, nearly all of these clus- 
ters show preference for bright and highly saturated images with 
a high color value, and they favor reddish over greenish colors as 
well as yellowish over bluish ones. The third cluster group (shad- 
owed by red color in Figure 6) comprises the clusters 3/3, 4/4, 5/5, 
6/6, and 2/7. All of these clusters show a significant preference 
for highly saturated colors and, with the exception of cluster 4/4, 
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FIGURE 5 | Dot plots of the average beauty rating plotted as a function 
of self-similarity (A), color hue (B), color saturation (C), and color value 
of the HSV color space (D), and the three channels of the Lab color 
space (E-G). Each dot represents one of 150 images rated in Experiment 1 A. 



The lines represent regression lines for the significant correlations only 
(B-E.G). For self-similarity and the red-over-green (a) channel of the Lab 
space (F) correlations were found for individual clusters (see Figure 6), but 
not for the group of all participants shown in this figure. 
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FIGURE 6 | Spearman coefficients for the correlations between 
image properties and beauty ratings. For the color coding of the 
different image properties, see the explanation on the right hand 
side of (C). (A-E) show the results for the different numbers of 



clusters (3-7 clusters, respectively). The colored backgrounds 
highlight clusters of similar preferences, independent of how many 
clusters (3-7) were formed. 'Indicates significant correlations 
(p < 0.05 or lower). 



also for yellow over blue color shades. A preference for highly 
self-similar images was detected in clusters 2/4, 5/6, and 6/6. 

We found no or only one single correlations for each of the 
other features (complexity, anisotropy, the Birkhoff-like measure 
and the aspect ratio; p < 0.05). 

Color hue is a circular measure. Therefore, we performed an 
additional analysis after splitting the hue values into six groups, 
each reflecting one color range (red, yellow, green, cyan, blue, and 
magenta). We did not find a significant preference for a specific 
color in any of the clusters. 

In addition to our analyses over participants, we performed 
two multivariate linear regression analyses across images to eval- 
uate in how far the average beauty rating and the perceptual 
contrast were predictable based on the statistical image proper- 
ties. In the first analysis, we considered the average beauty rating 
as dependent factor and the statistical image properties as inde- 
pendent factors. We found that higher values in the Lab-a channel 
lead participants to rate images as less beautiful (standard- 
ized P = 0.206, p < 0.05). For the Lab-b channel, higher values 
lead participants to rate images as more beautiful (standardized 



P = —0.284, p < 0.05). Overall, the analysis revealed a low pre- 
dictability of the system (R 2 = 0.134). In the second multivariate 
linear regression analysis, the perceptual contrast was considered 
as dependent factor, while the statistical image properties were 
used as independent factors. We found a significant positive effect 
of PHOG self-similarity on the perceptual contrast (standardized 
P = 0.203, p < 0.05). Overall, the analysis revealed a low pre- 
dictability of the system (R 2 = 0.084; see Table 2 for complete 
results). 

Furthermore, to investigate in how far the ratings by the par- 
ticipants from the different clusters were predictable based on 
the statistical image properties, the data were entered into a 
linear mixed model analysis over images, considering the allo- 
cation to the clusters as fixed factor, the respective mean beauty 
rating for each cluster as dependent factor, and the statisti- 
cal properties of each image as covariates. We found that the 
effect on the subjective rating of images differed between clus- 
ters for the following image properties: Self-similarity (F = 10.23, 
p < 0.001), complexity (F = 2.29, p < 0.05), Birkhoff-like mea- 
sure (F = 10.02, p < 0.001), color hue (F = 9.61, p < 0.001), 
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Table 2 | Regression coefficients (standardized [{) for the overall correlations between average beauty ratings and the magnitudes of the 
perceptual contrast, respectively, and selected image properties. 

Self-similarity Complexity Anisotropy Birkhoff-like HSV color HSV color HSV color Aspect Lab-L Lab-a Lab-b 

measure hue saturation value ratio lightness (red-green) (yellow-blue) 

Average beauty -0.017 0.029 0.074 0.074 -0.007 0.150 -0.060 -0.020 -0.092 0.206* -0.284* 

Perceptual contrast 0.203* -0.071 -0.164 0.045 0.059 0.001 0.079 0.047 -0.026 0.005 -0.105 



'Significant at p < 0.05. 

color saturation (F = 3.68, p < 0.05), and the aspect ratio (F = 
4.08, p < 0.05). Results for a detailed analysis of the differences 
between the clusters with regard to the interaction between sta- 
tistical image properties and cluster membership are provided in 
Table 3. 

DISCUSSION 

In this study, we demonstrate an aftereffect for perceived beauty of 
abstract artworks (Experiment IB). Participants rated the images 
as more beautiful after adaptation to least beautiful images and 
vice versa. Moreover, we correlated beauty ratings and the mag- 
nitude of the perceptual contrast with specific image properties 
that have been studied before in the context of aesthetic percep- 
tion (Experiment 2). The abstract artworks used in the present 
study have the advantage that the effect of semantic content, 
which might affect the assessment of beauty (Experiment 1A), is 
minimized. 

PERCEPTUAL CONTRAST 

Aftereffects on artworks have been shown previously, but only 
for adaptation to a single style of painting (Carbon et al., 2007) 
and for adaptation to the beauty of portrait paintings (Hayn- 
Leichsenring et al, 2013). Therefore, to the best of our knowledge, 
our study is the first to demonstrate a contrast effect for the per- 
ception of beauty in abstract paintings. Our findings suggest that 
perceptual contrast effects are not necessarily related to semantic 
content, but can be demonstrated for abstract images as well, i.e., 
for images that contain no (or only spurious) explicit meaning. 
This conclusion, however, remains restricted to the group of stu- 
dents with a Western cultural background, who took part in the 
present study. 

The result for cluster 5 shows a pattern opposite to the other 6 
clusters in Experiment IB (Figure 4). In this cluster, the rating was 
more negative after adaptation to least beautiful images and more 
positive after adaptation to beauty. The average rating by the par- 
ticipants in cluster 5 was 2.19, which is significantly lower than the 
mean from all the other clusters (2.81; the closest cluster has an 
average of 2.62). We tried to elucidate the reason for this unusual 
evaluation pattern. In particular, we asked whether there was an 
(inverse) correlation of the cluster-5 rating with respect to the 
other clusters. We also checked with a linear regression analysis 
whether the evaluation itself or the magnitude of the adaptation 
effect of single participants or of the entire cluster 5 correlated 
with any of the measured statistical image properties. However, 
none of the image properties analyzed seemed to have had a crit- 
ical influence on the participants' rating in cluster 5. We cannot 
exclude the possibility that participants in cluster 5 adapted to 
some other features (or combinations thereof) or shifted their 



attention to other criteria than the ones we investigated. A gen- 
eral lack of cooperation or attention cannot be responsible for 
the inverse contrast effect. In summary, we were not able to 
clarify why members of cluster 5 differed in their adaptation 
pattern. 

In 2 of the 7 clusters that were used in Experiment IB, the 
original rating scores (Experiment 1A) were significantly lower 
overall, i.e., the paintings were evaluated as less beautiful than in 
Experiment IB (p < 0.01 and p < 0.05, respectively). This differ- 
ence might possibly be due to a familiarity bias, as suggested by 
Cutting (2003). In his work on French Impressionism, he pro- 
posed that exposure to artworks of a certain style can increase 
preference for artworks belonging to the same style. For complex 
images, Berlyne (1970) observed a similar increase of preference 
that goes along with a decrease of novelty; he found the opposite 
trend for simple stimuli. Our present findings are in agreement 
with Berlyne's observation. We would argue that many of the 
abstract images shown in the experiments were rather complex 
and may have induced a high level of arousal in most of the partic- 
ipants who were non-experts and thus relatively naive with regard 
to abstract art. 

To explain the observed perceptual contrast, three possibili- 
ties should be considered. First, the effect might be the result 
of an adaptation to visual beauty. In this scenario, the per- 
ception of beauty in an individual observer is modified by 
changes in the responsiveness of the underlying neural cir- 
cuits. Second, the observed effect could be the result of a 
criterion shift, as described by Morgan et al. (2012) in their 
study on the shift of psychometric sensory discrimination func- 
tions. Following this notion, evaluation closely depends on a 
given criterion. A shift in evaluation may result from viewing 
a special set of adaptor stimuli (i.e., beautiful rated images) in 
which the criterion is especially pronounced. Third, conscious 
or unconscious comparison of the images might have biased 
the participants' ratings. Cogan et al. (2012) described such a 
comparison effect on hedonic contrast for stimuli related to 
face attractiveness. Based on the experimental data, it is not 
possible to decide which of these mechanisms (or which com- 
bination thereof) accounts for the contrast effect observed in 
the present study. In view of the short presentation times of 
the images (600 ms), it is likely that perceptual effects play a 
relatively prominent role in the observed effect compared to cog- 
nitive processes. This hypothesis is strengthened by the described 
correlation between the statistical properties (especially, PHOG 
self-similarity) and the size of the contrast effect. To discriminate 
between the role of perceptual vs. cognitive mechanisms in the 
perception of abstract art would be an interesting aim for future 
investigations. 
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COLOR FEATURES 

In Experiment 2, significant correlations were found between 
beauty ratings and some of the color measures (hue, satura- 
tion, and value of the HSV color space; the R and G chan- 
nels of the RGB color space; and the L and b channels of the 
Lab color space; see Table 1). These findings are in line with 
previous studies that revealed a prominent role of color for 
the aesthetic quality of images. The preference for bright, red- 
dish, and yellowish colors can be explained with the ecological 
valence theory proposed by Palmer and Schloss (2010) who 
proposed that color preferences are due to their associations 
with preferred objects (see Introduction). Unlike the findings by 
Palmer and Schloss (2010) who showed a general preference for 
bluish colors, we found a stronger preference for yellowish art- 
works. However, this difference may well be explained by the 
different test stimuli used in the experiments (homogeneously 
colored squares vs. artworks). In another recent investigation, 
Yanulevskaya et al. (2012) showed that bright and saturated colors 
generated positive emotions, while darker colors tended to evoke 
negative emotions. Moreover, Amirshahi et al. (2013) described 
a strong correlation of beauty ratings with color quantization 
values and mean color value in a large dataset of figurative 
paintings. In conclusion, our results confirm the crucial role of 
color for appreciation of beauty in artworks, which is also evi- 
dent from the account of this topic in art history and aesthetic 
theory. 

STATISTICAL PROPERTIES OTHER THAN COLOR FEATURES 

In addition to color values, we measured other statistical prop- 
erties of images (self-similarity, complexity, anisotropy, Birkhoff- 
like measure, and aspect ratio), which have been associated with 
aesthetic judgment, and searched for correlations with beauty 
ratings. The results of the present study are in line with pre- 
vious studies that focused on representational art (e.g., Redies 
et al, 2012; Braun et al., 2013). For example, the mean value 
for self-similarity obtained in the present study (0.68 ± 0.13 
SD) is similar to the average value for 197 works of representa- 
tional art (0.67 ± 0.09 SD; Table 1 in Braun et al., 2013). This 
value comes close to the respective value for photographs of nat- 
ural scenes (0.64 ± 0.10 SD) and is significantly higher than 
the self-similarity of urban scenes (0.55 ± 0.08 SD) or pho- 
tographs of simple objects (0.54 ± 0.07 SD). However, other 
images like photographs of branches possess an even higher 
degree of self-similarity (0.77 ± 0.07 SD; Braun et al., 2013). It 
therefore seems possible that there is a degree of self-similarity, 
for which processing in the visual system is optimized, as pro- 
posed by Taylor et al. (2011), and which therefore might evoke 
an aesthetic response. For our analysis, we used a relatively 
new approach to measure self-similarity, the PHOG method 
(see section Experiment 2: Image Analysis). Over all partici- 
pants, self-similarity values showed no correlation with beauty 
ratings. However, in some clusters, we obtained positive correla- 
tions with beauty, and negative correlations in other clusters. A 
possible explanation for our findings might be the rather high 
self-similarity in some of the abstract images, which may have 
led to some inverse correlations with the beauty rating. Moreover, 
self-similarity has a significant main effect on clustering (Table 3), 
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indicating that subgroups of persons differ in their preference for 
self-similarity in abstract paintings. 

For complexity and the Birkhoff-like measure, we did not find 
any significant correlation with beauty ratings. The correlation 
between beauty appreciation and complexity is thought to be 
non-linear and to manifest itself in an inverted u-shaped response 
curve (Berlyne, 1974; Nadal, 2007; Forsythe et al, 2011). Still, 
even after considering several statistical analyses to account for 
such an inverted u-shaped response curve, we were not able to 
detect any correlation. Because complexity is used for calculation 
of the Birkhoff-like measure, it is not surprising that this measure 
does not correlate with beauty either. 

Also, we did not find any correlations between beauty rat- 
ings and anisotropy. Generally, paintings of Western provenance 
are of low anisotropy (Redies et al., 2012). Our study was the 
first attempt to search for correlations of anisotropy with subjec- 
tive ratings on beauty. A causative role of anisotropy in beauty 
ratings remains to be established. Furthermore, there may be 
differences between various styles of figurative and abstract art. 
Finally, in agreement with previous findings (McManus, 1980; 
Russell, 2000), we found no correlations of beauty ratings with 
the aspect ratio of the abstract artworks. In summary, next to the 
color measures, self-similarity seems to be the best predictor for 
aesthetic appreciation of abstract artworks. 

DIFFERENCES IN BEAUTY PREFERENCES BETWEEN PARTICIPANTS 

Interestingly, some of these correlations between beauty ratings 
and image features differed between the subgroups (clusters) of 
participants (Figure 6). In particular, our results suggest that the 
participants had individual preferences for specific color com- 
binations and that there were clusters of persons with shared 
preferences. Correlations were stronger for single clusters than in 
the analysis over all participants. 

We performed the statistical analysis for sets of 3-7 clusters in 
order to look for consistent clusters. Interestingly, all sets of clus- 
ters shared the same group of 3 relatively stable clusters of similar 
preferences, which are highlighted by lightly colored backgrounds 
in Figure 6. This correspondence suggests that there are at least 
3 subgroups of participants, which differ in their preferences for 
specific types of beauty in abstract artworks. Other participants, 
who do not share preferences with these subgroups, may be allo- 
cated to the particular group they match best, although they 
may have a rather singular taste with no preference for a specific 
combination of the image features studied. Although the cluster- 
ing was performed exclusively based on subjective evaluations, 
we found differences of the effect of statistical image properties 
on the ratings by participants from the different clusters. We 
therefore propose that the clustering is not accidental but due 
to preferences for specific image properties by groups of partici- 
pants (Table 3). Of course, our sample of participants is restricted 
to students in Germany and may not be representative for the 
population at large or other cultural backgrounds. 

Our findings go along with previous studies that confirmed 
interindividual differences in aesthetic appraisal. Jacobsen and 
Hofel (2002) described substantial differences in individuals who 
evaluated novel graphic patterns with respect to their subjective 
definition of beauty. Similar to the findings from Experiment 1A, 



the authors were able to represent individual patterns of judgment 
more accurately with an individual judgment paradigm com- 
pared to a group model (Jacobsen and Hofel, 2002). Interestingly, 
Vessel and Rubin (2010) assumed that, as a result of shared 
semantic interpretations, people show a high degree of agreement 
in appraising the beauty in real-world scenes but a rather individ- 
ual taste in a non-semantic context, as it is the case for abstract art. 
Hence, the individual evaluation differences in the present study 
might partly be explained by the usage of abstract art. 

Augustin et al. (2012) focused on word usage for describing 
art images and obtained evidence for interindividual differences 
in aesthetic appraisal. While "beautiful" and "ugly" are terms that 
have a similar meaning for a majority of people, other adjectives 
for describing art have meanings that are more variable between 
individuals. This variable usage of words suggests differences in 
art appreciation between individuals. 

We are aware of the limited validity of the clustering results 
of the present study, as most of the participants in our study 
shared similar origin, age, and social group. Additionally, due to 
the experimental design, we presented only a limited selection of 
artworks. Future studies will have to confirm that clusters with 
distinct tendencies of preference can also be obtained for other 
cultures and social groups. 

FINAL CONCLUSION AND OUTLOOK 

In conclusion, we found a perceptual contrast effect on perceived 
beauty in abstract art images. Unlike previous adaptation stud- 
ies on simple stimuli, the present study uses artworks that were 
rather heterogeneous and some were highly complex. Perhaps not 
surprisingly, clusters of participants differed in their individual 
preference for these artworks, and some clusters showed pref- 
erence for a specific pattern of similar low-level image features 
present in the artworks. We hypothesize that the perceptual con- 
trast depends, at least in part, on these low-level features, which 
might be the basis for a criterion shift. However, it was impossible 
to define a single reason for the perceptual contrast observed in 
Experiment IB. 

In future studies, it will be of interest to study whether people 
prefer different image features depending on different styles of art 
or semantic content. For instance, it can be hypothesized that fea- 
tures that are related to natural depictions may be less important 
in abstract art whereas, vice versa, geometrical shapes and col- 
ors may have a stronger influence in abstract art, as they remain 
the only visual qualities that the observer can refer to. Moreover, 
in the present study, we focused exclusively on statistical image 
features that can be processed at low levels of the visual system. 
Evidently, high-level properties, such as the knowledge about the 
artist and the artistic style, also play an important role in the 
aesthetic appreciation of artworks (see, e.g., Leder et al., 2004; 
Wallraven etal, 2009). 
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